66 research outputs found

    Characterisation and correction of signal fluctuations in successive acquisitions of microarray images

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There are many sources of variation in dual labelled microarray experiments, including data acquisition and image processing. The final interpretation of experiments strongly relies on the accuracy of the measurement of the signal intensity. For low intensity spots in particular, accurately estimating gene expression variations remains a challenge as signal measurement is, in this case, highly subject to fluctuations.</p> <p>Results</p> <p>To evaluate the fluctuations in the fluorescence intensities of spots, we used series of successive scans, at the same settings, of whole genome arrays. We measured the decrease in fluorescence and we evaluated the influence of different parameters (PMT gain, resolution and chemistry of the slide) on the signal variability, at the level of the array as a whole and by intensity interval. Moreover, we assessed the effect of averaging scans on the fluctuations. We found that the extent of photo-bleaching was low and we established that 1) the fluorescence fluctuation is linked to the resolution e.g. it depends on the number of pixels in the spot 2) the fluorescence fluctuation increases as the scanner voltage increases and, moreover, is higher for the red as opposed to the green fluorescence which can introduce bias in the analysis 3) the signal variability is linked to the intensity level, it is higher for low intensities 4) the heterogeneity of the spots and the variability of the signal and the intensity ratios decrease when two or three scans are averaged.</p> <p>Conclusion</p> <p>Protocols consisting of two scans, one at low and one at high PMT gains, or multiple scans (ten scans) can introduce bias or be difficult to implement. We found that averaging two, or at most three, acquisitions of microarrays scanned at moderate photomultiplier settings (PMT gain) is sufficient to significantly improve the accuracy (quality) of the data and particularly those for spots having low intensities and we propose this as a general approach. For averaging and precise image alignment at sub-pixel levels we have made a program freely available on our web-site <url>http://bioinfome.cgm.cnrs-gif.fr</url> to facilitate implementation of this approach.</p

    Assessing and selecting gene expression signals based upon the quality of the measured dynamics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the challenges with modeling the temporal progression of biological signals is dealing with the effect of noise and the limited number of replicates at each time point. Given the rising interest in utilizing predictive mathematical models to describe the biological response of an organism or analysis such as clustering and gene ontology enrichment, it is important to determine whether the dynamic progression of the data has been accurately captured despite the limited number of replicates, such that one can have confidence that the results of the analysis are capturing important salient dynamic features.</p> <p>Results</p> <p>By pre-selecting genes based upon quality before the identification of differential expression via algorithm such as EDGE, it was found that the percentage of statistically enriched ontologies (p < .05) was improved. Furthermore, it was found that a majority of the genes found via the proposed technique were also selected via an EDGE selection though the reverse was not necessarily true. It was also found that improvements offered by the proposed algorithm are anti-correlated with improvements in the various microarray platforms and the number of replicates. This is illustrated by the fact that newer arrays and experiments with more replicates show less improvement when the filtering for quality is first run before the selection of differentially expressed genes. This suggests that the increase in the number of replicates as well as improvements in array technologies are increase the confidence one has in the dynamics obtained from the experiment.</p> <p>Conclusion</p> <p>We have developed an algorithm that quantifies the quality of temporal biological signal rather than whether the signal illustrates a significant change over the experimental time course. Because the use of these temporal signals, whether it is in mathematical modeling or clustering, focuses upon the entire time series, it is necessary to develop a method to quantify and select for signals which conform to this ideal. By doing this, we have demonstrated a marked and consistent improvement in the results of a clustering exercise over multiple experiments, microarray platforms, and experimental designs.</p

    Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction

    Get PDF
    BACKGROUND: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we describe an unsupervised pattern generation method called AutoPat. It is a gene expression mining system that can generate unsupervised patterns automatically from a given set of seed patterns. The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively. CONCLUSIONS/SIGNIFICANCE: Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability. The conducted regulation networks could also be built precisely and effectively. The system in this study is available at http://ikmbio.csie.ncku.edu.tw/AutoPat/

    Combining Network Modeling and Gene Expression Microarray Analysis to Explore the Dynamics of Th1 and Th2 Cell Regulation

    Get PDF
    Two T helper (Th) cell subsets, namely Th1 and Th2 cells, play an important role in inflammatory diseases. The two subsets are thought to counter-regulate each other, and alterations in their balance result in different diseases. This paradigm has been challenged by recent clinical and experimental data. Because of the large number of genes involved in regulating Th1 and Th2 cells, assessment of this paradigm by modeling or experiments is difficult. Novel algorithms based on formal methods now permit the analysis of large gene regulatory networks. By combining these algorithms with in silico knockouts and gene expression microarray data from human T cells, we examined if the results were compatible with a counter-regulatory role of Th1 and Th2 cells. We constructed a directed network model of genes regulating Th1 and Th2 cells through text mining and manual curation. We identified four attractors in the network, three of which included genes that corresponded to Th0, Th1 and Th2 cells. The fourth attractor contained a mixture of Th1 and Th2 genes. We found that neither in silico knockouts of the Th1 and Th2 attractor genes nor gene expression microarray data from patients with immunological disorders and healthy subjects supported a counter-regulatory role of Th1 and Th2 cells. By combining network modeling with transcriptomic data analysis and in silico knockouts, we have devised a practical way to help unravel complex regulatory network topology and to increase our understanding of how network actions may differ in health and disease

    GenCLiP: a software program for clustering gene lists by literature profiling and constructing gene co-occurrence networks related to custom keywords

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biomedical researchers often want to explore pathogenesis and pathways regulated by abnormally expressed genes, such as those identified by microarray analyses. Literature mining is an important way to assist in this task. Many literature mining tools are now available. However, few of them allows the user to make manual adjustments to zero in on what he/she wants to know in particular.</p> <p>Results</p> <p>We present our software program, GenCLiP (Gene Cluster with Literature Profiles), which is based on the methods presented by Chaussabel and Sher (<it>Genome Biol </it>2002, 3(10):RESEARCH0055) that search gene lists to identify functional clusters of genes based on up-to-date literature profiling. Four features were added to this previously described method: the ability to 1) manually curate keywords extracted from the literature, 2) search genes and gene co-occurrence networks related to custom keywords, 3) compare analyzed gene results with negative and positive controls generated by GenCLiP, and 4) calculate probabilities that the resulting genes and gene networks are randomly related. In this paper, we show with a set of differentially expressed genes between keloids and normal control, how implementation of functions in GenCLiP successfully identified keywords related to the pathogenesis of keloids and unknown gene pathways involved in the pathogenesis of keloids.</p> <p>Conclusion</p> <p>With regard to the identification of disease-susceptibility genes, GenCLiP allows one to quickly acquire a primary pathogenesis profile and identify pathways involving abnormally expressed genes not previously associated with the disease.</p

    Seeded Bayesian Networks: Constructing genetic networks from microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes – often represented as networks – in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results.</p> <p>Results</p> <p>Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence gene-gene interactions which can then be validated by comparison to other sources of pathway data.</p> <p>Conclusion</p> <p>The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or high-throughput protein-protein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.</p

    Combining active learning and semi-supervised learning techniques to extract protein interaction sentences

    Get PDF
    Background: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. Methods: We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. Results: By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Conclusions: Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.X116sciescopu

    Improving protein function prediction methods with integrated literature data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity.</p> <p>Results</p> <p>We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial.</p> <p>Conclusion</p> <p>Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.</p

    Identification and Analysis of Co-Occurrence Networks with NetCutter

    Get PDF
    BACKGROUND: Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. The methodologies and statistical models used to evaluate the significance of association between co-occurring entities are quite diverse, however. METHODOLOGY/PRINCIPAL FINDINGS: We present a general framework for co-occurrence analysis based on a bipartite graph representation of the data, a novel co-occurrence statistic, and software performing co-occurrence analysis as well as generation and analysis of co-occurrence networks. We show that the overall stringency of co-occurrence analysis depends critically on the choice of the null-model used to evaluate the significance of co-occurrence and find that random sampling from a complete permutation set of the bipartite graph permits co-occurrence analysis with optimal stringency. We show that the Poisson-binomial distribution is the most natural co-occurrence probability distribution when vertex degrees of the bipartite graph are variable, which is usually the case. Calculation of Poisson-binomial P-values is difficult, however. Therefore, we propose a fast bi-binomial approximation for calculation of P-values and show that this statistic is superior to other measures of association such as the Jaccard coefficient and the uncertainty coefficient. Furthermore, co-occurrence analysis of more than two entities can be performed using the same statistical model, which leads to increased signal-to-noise ratios, robustness towards noise, and the identification of implicit relationships between co-occurring entities. Using NetCutter, we identify a novel protein biosynthesis related set of genes that are frequently coordinately deregulated in human cancer related gene expression studies. NetCutter is available at http://bio.ifom-ieo-campus.it/NetCutter/). CONCLUSION: Our approach can be applied to any set of categorical data where co-occurrence analysis might reveal functional relationships such as clinical parameters associated with cancer subtypes or SNPs associated with disease phenotypes. The stringency of our approach is expected to offer an advantage in a variety of applications

    Pathway-Based Evaluation in Early Onset Colorectal Cancer Suggests Focal Adhesion and Immunosuppression along with Epithelial-Mesenchymal Transition

    Get PDF
    Colorectal cancer (CRC) has one of the highest incidences among all cancers. The majority of CRCs are sporadic cancers that occur in individuals without family histories of CRC or inherited mutations. Unfortunately, whole-genome expression studies of sporadic CRCs are limited. A recent study used microarray techniques to identify a predictor gene set indicative of susceptibility to early-onset CRC. However, the molecular mechanisms of the predictor gene set were not fully investigated in the previous study. To understand the functional roles of the predictor gene set, in the present study we applied a subpathway-based statistical model to the microarray data from the previous study and identified mechanisms that are reasonably associated with the predictor gene set. Interestingly, significant subpathways belonging to 2 KEGG pathways (focal adhesion; natural killer cell-mediated cytotoxicity) were found to be involved in the early-onset CRC patients. We also showed that the 2 pathways were functionally involved in the predictor gene set using a text-mining technique. Entry of a single member of the predictor gene set triggered a focal adhesion pathway, which confers anti-apoptosis in the early-onset CRC patients. Furthermore, intensive inspection of the predictor gene set in terms of the 2 pathways suggested that some entries of the predictor gene set were implicated in immunosuppression along with epithelial-mesenchymal transition (EMT) in the early-onset CRC patients. In addition, we compared our subpathway-based statistical model with a gene set-based statistical model, MIT Gene Set Enrichment Analysis (GSEA). Our method showed better performance than GSEA in the sense that our method was more consistent with a well-known cancer-related pathway set. Thus, the biological suggestion generated by our subpathway-based approach seems quite reasonable and warrants a further experimental study on early-onset CRC in terms of dedifferentiation or differentiation, which is underscored in EMT and immunosuppression
    corecore